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8.  The  Problem  Studied: 

The  major  shortcoming  of  robust  methodology  in  statistical  linear  models 
has  been  a  limitation  primarily  to  parameter  estimation  in  fixed  effects  models. 
The  major  competitior,  classical  least  squares  methodology,  by  contrast  offers 
a  unified  treatment  of  estimation,  testing,  and  multiple  comparisons  techniques 
in  a  wide  range  of  fixed  and  random  effects  models.  Our  research  efforts  have 
been  directed  toward  filling  the  void  in  robust  methods  in  order  to  provide  a 
complete  alternative  to  least  squares  procedures. 

Define  the  linear  model 

Y  *=  X0  +  e  (1) 

where  Y  is  an  n*l  vector  of  observations,  X  is  an  n*p  design  matrix  of 
rank  k  ,  0  is  a  pxl  vector  of  unknown  parameters,  and  e  is  an  nxl  vector 
of  random  errors.  We  will  consider  at  present  the  fixed  effects  case  where  the 
components  of  e  are  independent;  the  random  effects  case  allows  a  more  general 
covariance  structure  and  will  be  considered  later.  A  comprehensive  review  and 
discussion  appears  in  McKean  and  Schrader  (1981). 

A  least  squares  estimate  of  0  ,  0^g  ,  is  found  by  minimizing  the  dispersion 
function 

dls(p)  “  2(yrxie)2  (2) 

th  a 

where  the  x^^  is  the  i  row  of  X  .  An  M-estimate,  0M  ,  is  found  by  minimizing 

Dm(0)  =  EpCYj-XjB)  (3) 

for  an  appropriate  function  p  ,  see  Huber  (1973).  An  R-estimate,  0  ,  is  found 

K 

by  minimizing 

Dr(B)  =>  [a(R | yi-xi0 1 )  (yj-XjB)  (4) 

where  a(*)  is  an  appropriate  score  function  on  the  integers  and  r(u^|  denotes 

the  rank  of  | u^ |  among  |u^| , . . . , |un|  ;  see  Jaeckel  (1972).  Note  that  an 

estimate  corresponds  either  to  a(i)  Hi  or  p(x)  =  |xj  in  (4)  or  (3), 

respectively;  hence  an  i estimate  is  both  an  R-  and  M-estimate.  Under 

regularity  conditions,  all  of  these  estimators  are  asymptotically  normal  with 

2 

variance-covariance  matrix  K  (X’X)  where  K  is  a  scale  parameter  which  depends 
upon  the  distribution  of  errors  and  the  estimate  selected,  and  (X'X)  is 

an  appropriate  generalized  inverse  of  (X’X)  . 

General  linear  hypotheses  are  expressed  as 
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Hq  :  Hg  **  0  vs.  Hx  :  Hg  +  0  (5) 

for  a  qxp  matrix  of  rank  q  .  The  model  (1)  is  the  full  model  and  (1)  subject 
to  Hq  is  the  reduced  model.  The  classical  F-statistic  for  testing  Hq  is 

fls  "  I(dls(r)  ■  DLS<F))/{k-q)]/52  (6) 

where  D  (R)  and  D  0(F)  denote  minimum  vlaues  of  (2)  under  reduced  and  full 
Lb  Lb  ^ 

models,  respectively,  and  o  is  an  estimate  of  Var(e^)  .  We  have  proposed 

robust  test  statistics,  F„  and  F_  ,  similar  to  F.„  where 

M  K  LS> 

FM  =  [(VR)  "  VF))/(k"q)]/^M  (7) 

with  D„(R)  and  D„(F)  the  reduced  and  full  model  values  of  (3) .  F„  is 
MM  R 

defined  similarly. 

We  published  asymptotic  distribution  theory  of  F ^  and  F^  in  McKean  and 
Hettmansperger  (1976)  and  Schrader  and  Hettmansperger  (1980) .  This  asymptotic 

A 

theory,  along  with  asymptotic  theory  of  the  estimator,  g  ,  provides  useful 
guidelines  on  appropriate  standardizing  constants  and  null  distribution  theory. 

We  investigated  small  sample  properties  of  these  procedures  during  the  contract. 
Specifically,  various  proposals  for  estimating  the  standardizing  constants  K 
and  A  were  examined.  The  intent  of  the  research  was  to  provide  a  good 
approximation  to  the  small  sample  distributions  of  g  ,  F„  ,  and  in  order 

to  provide  reasonable  confidence  and  inference  procedures. 

9.  Summary  of  Major  Results: 

Asymptotic  theory  for  F  was  developed  previously  by  McKean  and  Hettmansperger 

(1976,1978).  During  the  term  of  this  contract,  Schrader  and  Hettmansperger  (1980) 

established  similar  asymptotic  theory  for  F„  and  certain  variants  of  Fw  .  In 

m  n 

this  work  the  connection  between  F„  and  gw  was  shown  to  be  the  same  as  the 

M  M 

connection  between  actual  maximum  likelihood  estimates  and  likelihood  ratio  tests. 

The  l ,  or  least  absolute  errors,  estimate  has  received  considerable 
attention  in  recent  years.  As  noted  previously  this  is  technically  both  an  M-  and 
an  R-estimate.  Because  the  score  function  involved  fails  to  meet  certain 
smoothness  criteria,  the  distribution  theory  cited  above  for  F„  and  FD  does 
not  apply  to  a  similar  "F-ratio."  In  Schrader  and  McKean  (1981)  we  established 
asymptotic  theory  for  the  procedure  and  investigated  its  small  sample 

behavior. 

A  coordinate-free  approach  to  classical  linear  models  theory,  as  discussed 
by  Kruskal  (1968)  greatly  enhances  the  interpretability  of  least  squares  techniques. 
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In  McKean  and  Schrader  (1980a)  ,  we  developed  a  similar  geometric  formulation  of 
robust  methods,  providing  a  corresponding  simple  interpretation.  In  this  work 
we  discussed  ideas  of  estimable  functions  and  testable  hypotheses  in  cases  where 
expectations  of  errors  need  not  exist,  and  demonstrated  that  robust  methods 
involve  only  replacing  the  norm  by  another  appropriate  norm. 

In  order  to  gain  insight  into  the  problem  of  estimating  the  standardizing 
constants  K  and  A  ,  we  began  Monte  Carlo  work  with  the  £.  estimate,  which  is 
both  an  R-  and  M-estimate.  The  constant  A  in  this  case  is  A  =  [4f(n)]  , 

and  the  constant  K  is  [2f(n)]  ^  where  f  is  the  underlying  density  of  the 
errors  and  n  is  the  median  of  this  distribution.  Since  K  =  -^A  ,  as  is  the 
case  for  all  R-  and  M-estimates,  we  began  with  the  simple  location  problem  and 
examined  how  well  various  estimates  of  K  "Studentize"  the  sample  median 
(the  estimate  in  this  case) .  In  a  large  Monte  Carlo  study  of  this  problem 

McKean  and  Schrader  (1980b)  we  found  that  two  estimates  of  K  ,  hence  also  of 
A  ,  served  to  standardize  the  median  in  a  much  more  stable  manner  than  several 
others;  the  Bootstrap  estimate  of  Efron  (1979)  and  a  standardized  confidence 
interval  length  similar  to  a  proposal  by  Lehmann  (1963) .  Surprisingly,  we 
discovered  that  the  best  small  sample  approximating  distribution  for  a  Studentized 
median  is  the  standard  normal;  it  would  be  expected  that  a  more  heavy  tailed 
distribution  such  as  Student's  t  would  be  appropriate. 

We  proceeded  to  an  in  depth  study  of  the  robust  F-test  for  estimates  in 

Schrader  and  McKean  (1981) .  The  estimates  of  A  which  performed  well  in  the 
study  of  the  median  were  employed  in  this  study.  We  also  examined  a  large  number 
of  both  numerator  and  deraoninator  degrees  of  freedom,  both  of  which  should  exert 
considerable  influence  on  the  behavior  of  the  procedure.  In  this  study  we 
discovered  that  both  quadratic  forms  in  Jl  estimates  and  F  _  can  be  so  unstable 

X  Lo 

for  certain  error  distributions  that  they  are  almost  impossible  to  standardize 
adequately.  The  ratio  based  upon  reduction  in  absolute  errors  behaved  in  a 
reasonably  stable  manner,  by  contrast.  Also  surprising  for  ,  as  it  was 
for  the  median,  was  that  the  asymptotic  distribution  (chi-squared)  is  the 
appropriate  standardizing  distribution;  again,  one  would  expect  a  more  direct 
influence  of  denominator  degrees  of  freedom. 

We  developed  efficient  and  stable  algorithms  to  perform  robust  analysis  for 
general  non-full  rank  linear  models;  see  McKean  and  Schrader  (1981).  Extensive 
use  was  made  of  the  state-of-the-art  numerical  software  package  LINPACK 
(Dongerra,  et.al.,  1979). 


In  a  comparison  between  classical  outlier  detection  methods  (John,  1978)  and 
robust  methodology,  we  demonstrated  that  robust  methods  perform  automatically  the 
desired  outlier  detection  and  inference  (McKean  and  Schrader,  1981).  Classical 
methods  are  much  more  complicated  and  subjective,  and  are  less  reliable. 

The  statistic  FR  is  only  one  of  several  proposals  for  an  R-estimate  based 
analysis  of  variance.  Hettmansperger  and  McKean  (1981)  presented  a  unified 
geometric  approach  to  various  of  these  methods.  In  a  Monte  Carlo  study  they 
demonstrated  that  F  behaved  most  stably  across  error  distributions  and  design 
conf igur a  t ions . 
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